Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segment replace test #8209

Merged
merged 2 commits into from
Feb 16, 2022

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Feb 16, 2022

recently there were many instability in PinotHelixResourceManagerTest.testSegmentReplacementForRefresh where it stuck at waitForSegmentsToDelete.
e.g. 2 completely different/irrelevant PR failed on this:
https://github.com/apache/pinot/runs/5194729499?check_suite_focus=true
https://github.com/apache/pinot/runs/5161381187?check_suite_focus=true

This PR moves it to the stateless test so it doesn't run together with other statefuls tests. also simplified the logic to run them sequentially

serialize segment lineage test
@walterddr walterddr force-pushed the fix_segment_replace_test branch from 64bdc0f to af4fac9 Compare February 16, 2022 16:55
@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2022

Codecov Report

Merging #8209 (1f88d68) into master (8042408) will decrease coverage by 6.64%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #8209      +/-   ##
============================================
- Coverage     70.97%   64.32%   -6.65%     
  Complexity     4313     4313              
============================================
  Files          1626     1581      -45     
  Lines         84851    83048    -1803     
  Branches      12790    12579     -211     
============================================
- Hits          60221    53424    -6797     
- Misses        20496    25788    +5292     
+ Partials       4134     3836     -298     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.38% <ø> (-0.02%) ⬇️
unittests2 14.10% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
...che/pinot/controller/util/TableMetadataReader.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
...not/common/exception/HttpErrorStatusException.java 0.00% <0.00%> (-100.00%) ⬇️
... and 372 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8042408...1f88d68. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think switching it to stateless is the issue. Per the failure run, extending TIMEOUT_IN_MS to 60s should fix the test. Github Actions vm has very limited resource, so we might need to wait longer than what is needed on local box

@walterddr
Copy link
Contributor Author

done. but I think we should move it over anyway.
originally the idea was to save startup tear down time using stateful suite. but since we already have "stateless" version of the PinotHelixResourceManagerTest, there's no reason we need the stateful one because we anyway will start a new cluster

@Jackie-Jiang
Copy link
Contributor

The logic in the stateless test is slightly different, where it actually tests the tenant behavior with fake brokers and servers which doesn't apply to the segment upload where real server might be required (even thought it might not be tested). I'd suggest keeping the test unmoved to limit the change and see if changing timeout fixes the issue.

@walterddr walterddr force-pushed the fix_segment_replace_test branch from 0fc7b4b to 1f88d68 Compare February 16, 2022 20:18
@walterddr
Copy link
Contributor Author

The logic in the stateless test is slightly different, where it actually tests the tenant behavior with fake brokers and servers which doesn't apply to the segment upload where real server might be required (even thought it might not be tested). I'd suggest keeping the test unmoved to limit the change and see if changing timeout fixes the issue.

sounds good.
although i still think the test should be stateless, for one it is not suppose to test against server/broker logic in pinot-controller module which it does in the stateful test class; also it worked perfectly in stateless test.
however given we don't have a test to cover the lineage logic in broker/server separately. i would agree to keep the current behavior (though it is much slower)

@Jackie-Jiang Jackie-Jiang merged commit 950ea76 into apache:master Feb 16, 2022
xiangfu0 pushed a commit to xiangfu0/pinot that referenced this pull request Feb 23, 2022
@walterddr walterddr deleted the fix_segment_replace_test branch December 6, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants